Skip to content

Add update.py script to combine download and convert functionality with --update flag#13

Draft
Copilot wants to merge 3 commits into
mainfrom
copilot/add-update-script-functionality
Draft

Add update.py script to combine download and convert functionality with --update flag#13
Copilot wants to merge 3 commits into
mainfrom
copilot/add-update-script-functionality

Conversation

Copilot AI commented Oct 22, 2025

Copy link
Copy Markdown
Contributor

Overview

This PR adds a new scripts/update.py script that combines the functionality of both download.py and convert_transcripts.py, streamlining the workflow for managing YouTube transcripts. The script downloads transcripts directly into the Jekyll site structure, eliminating the need to run two separate scripts.

Key Features

Combined Download + Convert Workflow

The new script integrates the YouTube download functionality from download.py with the Jekyll post generation from convert_transcripts.py. Instead of:

  1. Running download.py to fetch transcripts into ./transcripts/
  2. Running convert_transcripts.py to convert them to Jekyll format

You can now simply run:

python3 scripts/update.py

This will:

  • Fetch video metadata and transcripts from the YouTube channel
  • Directly create Jekyll posts in _posts/ with proper YAML front matter
  • Copy VTT caption files to _includes/captions/ named by YouTube ID

Update Mode with --update Flag

The script now supports updating existing posts with current view and like counts:

python3 scripts/update.py --update

This feature:

  • Scans all existing posts in _posts/ to extract YouTube IDs
  • For new videos: Downloads and creates new posts with VTT files
  • For existing videos: Fetches current metadata and updates only view_count and like_count fields

Additional Options

# Start processing from a specific video (useful for resuming interrupted runs)
python3 scripts/update.py --offset 50

Documentation Updates

Updated scripts/README.md with comprehensive documentation for both download.py and update.py, following the same structure as the existing convert_transcripts.py section. Each section includes:

  • What the script does
  • Usage examples with commands
  • Available options
  • Output format descriptions
  • Important notes and requirements

Testing

All core functionality has been thoroughly tested:

  • ✅ Correctly identifies 349 existing posts by scanning for youtube_id values
  • ✅ Successfully updates view_count and like_count in YAML front matter
  • ✅ Properly creates Jekyll posts with escaped special characters
  • ✅ Handles multi-line descriptions using YAML literal block syntax
  • ✅ CodeQL security scan passed with zero vulnerabilities

Files Changed

  • scripts/update.py (NEW) - 392 lines combining download and convert functionality
  • scripts/README.md - Added documentation sections for download.py and update.py
  • .gitignore - Added Python cache file patterns (__pycache__/, *.py[cod], etc.)

This enhancement significantly improves the maintainability of the transcript site by providing a single, unified tool for both initial downloads and ongoing updates.

Original prompt

Add a new third python script file ./scripts/update.py which combines the functionality of both existing pythong scripts in that folder, modifying the first download.py code to obtain the youtube video transcripts but using the structure created by convert_transcripts.py directly so that the second script code is not needed. Then add to this an --update arg to update the site structure by creating a list of all existing youtube IDs (youtube_id), and for new youtube IDs not already existing in the site with a ./_posts file and and ./_includes/captions/ VTT file, create them, and for all existing youtube IDs, in each ./_posts file for update only the view_count and like_count properties. Then update the ./scripts/README.md to add new sections for the download.py and update.py files using the same structure as the existing section for convert_transcripts.py.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits October 22, 2025 14:03
Co-authored-by: davelab6 <261579+davelab6@users.noreply.github.com>
Co-authored-by: davelab6 <261579+davelab6@users.noreply.github.com>
Copilot AI changed the title [WIP] Add update script to combine functionality of existing scripts Add update.py script to combine download and convert functionality with --update flag Oct 22, 2025
Copilot AI requested a review from davelab6 October 22, 2025 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants